DROPS

Document

DOI: 10.4230/OASIcs.SLATE.2023.1

Question Answering over Linked Data with GPT-3

Authors: Bruno Faria, Dylan Perdigão, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

This paper explores GPT-3 for answering natural language questions over Linked Data. Different engines of the model and different approaches are adopted for answering questions in the QALD-9 dataset, namely: zero and few-shot SPARQL generation, as well as fine-tuning in the training portion of the dataset. Answers retrieved by the generated queries and answers generated directly by the model are also compared. Overall results are generally poor, but several insights are provided on using GPT-3 for the proposed task.

Cite as

Bruno Faria, Dylan Perdigão, and Hugo Gonçalo Oliveira. Question Answering over Linked Data with GPT-3. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 1:1-1:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{faria_et_al:OASIcs.SLATE.2023.1,
  author =	{Faria, Bruno and Perdig\~{a}o, Dylan and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Question Answering over Linked Data with GPT-3}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{1:1--1:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.1},
  URN =		{urn:nbn:de:0030-drops-185155},
  doi =		{10.4230/OASIcs.SLATE.2023.1},
  annote =	{Keywords: SPARQL Generation, Prompt Engineering, Few-Shot Learning, Question Answering, GPT-3}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.4

Generating and Ranking Distractors for Multiple-Choice Questions in Portuguese

Authors: Hugo Gonçalo Oliveira, Igor Caetano, Renato Matos, and Hugo Amaro

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

In the process of multiple-choice question generation, different methods are often considered for distractor acquisition, as an attempt to cover as many questions as possible. Some, however, result in many candidate distractors of variable quality, while only three or four are necessary. We implement some distractor generation methods for Portuguese and propose their combination and ranking with language models. Experimentation results confirm that this increases both coverage and suitability of the selected distractors.

Cite as

Hugo Gonçalo Oliveira, Igor Caetano, Renato Matos, and Hugo Amaro. Generating and Ranking Distractors for Multiple-Choice Questions in Portuguese. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 4:1-4:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2023.4,
  author =	{Gon\c{c}alo Oliveira, Hugo and Caetano, Igor and Matos, Renato and Amaro, Hugo},
  title =	{{Generating and Ranking Distractors for Multiple-Choice Questions in Portuguese}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{4:1--4:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.4},
  URN =		{urn:nbn:de:0030-drops-185185},
  doi =		{10.4230/OASIcs.SLATE.2023.4},
  annote =	{Keywords: Multiple-Choice Questions, Distractor Generation, Language Models}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.3

Question Answering For Toxicological Information Extraction

Authors: Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira, Hugo Amaro, Ângela Laranjeiro, and Catarina Silva

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Working with large amounts of text data has become hectic and time-consuming. In order to reduce human effort, costs, and make the process more efficient, companies and organizations resort to intelligent algorithms to automate and assist the manual work. This problem is also present in the field of toxicological analysis of chemical substances, where information needs to be searched from multiple documents. That said, we propose an approach that relies on Question Answering for acquiring information from unstructured data, in our case, English PDF documents containing information about physicochemical and toxicological properties of chemical substances. Experimental results confirm that our approach achieves promising results which can be applicable in the business scenario, especially if further revised by humans.

Cite as

Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira, Hugo Amaro, Ângela Laranjeiro, and Catarina Silva. Question Answering For Toxicological Information Extraction. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{ferreira_et_al:OASIcs.SLATE.2022.3,
  author =	{Ferreira, Bruno Carlos Lu{\'\i}s and Gon\c{c}alo Oliveira, Hugo and Amaro, Hugo and Laranjeiro, \^{A}ngela and Silva, Catarina},
  title =	{{Question Answering For Toxicological Information Extraction}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.3},
  URN =		{urn:nbn:de:0030-drops-167493},
  doi =		{10.4230/OASIcs.SLATE.2022.3},
  annote =	{Keywords: Information Extraction, Question Answering, Transformers, Toxicological Analysis}
}

@InProceedings{ferreira_et_al:OASIcs.SLATE.2022.3,
  author =	{Ferreira, Bruno Carlos Lu{\'\i}s and Gon\c{c}alo Oliveira, Hugo and Amaro, Hugo and Laranjeiro, \^{A}ngela and Silva, Catarina},
  title =	{{Question Answering For Toxicological Information Extraction}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.3},
  URN =		{urn:nbn:de:0030-drops-167493},
  doi =		{10.4230/OASIcs.SLATE.2022.3},
  annote =	{Keywords: Information Extraction, Question Answering, Transformers, Toxicological Analysis}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.19

Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs

Authors: Hugo Gonçalo Oliveira, Sara Inácio, and Catarina Silva

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Following the current interest in developing automatic question answering systems, we analyse alternative approaches for finding suitable answers from a list of Frequently Asked Questions (FAQs), in Portuguese. These rely on different technologies, some more established and others more recent, and are all easily adaptable to new lists of FAQs, on new domains. We analyse the effort required for their configuration, the accuracy of their answers, and the time they take to get such answers. We conclude that traditional Information Retrieval (IR) can be a solution for smaller lists of FAQs, but approaches based on deep neural networks for sentence encoding are at least as reliable and less dependent on the number and complexity of the FAQs. We also contribute with a small dataset of Portuguese FAQs on the domain of telecommunications, which was used in our experiments.

Cite as

Hugo Gonçalo Oliveira, Sara Inácio, and Catarina Silva. Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 19:1-19:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2022.19,
  author =	{Gon\c{c}alo Oliveira, Hugo and In\'{a}cio, Sara and Silva, Catarina},
  title =	{{Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{19:1--19:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.19},
  URN =		{urn:nbn:de:0030-drops-167652},
  doi =		{10.4230/OASIcs.SLATE.2022.19},
  annote =	{Keywords: Natural Language Processing, Portuguese, Question Answering, FAQs, Information Retrieval, Sentence Encoding, Transformers}
}

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2022.19,
  author =	{Gon\c{c}alo Oliveira, Hugo and In\'{a}cio, Sara and Silva, Catarina},
  title =	{{Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{19:1--19:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.19},
  URN =		{urn:nbn:de:0030-drops-167652},
  doi =		{10.4230/OASIcs.SLATE.2022.19},
  annote =	{Keywords: Natural Language Processing, Portuguese, Question Answering, FAQs, Information Retrieval, Sentence Encoding, Transformers}
}

Document

DOI: 10.4230/OASIcs.LDK.2021.21

On the Utility of Word Embeddings for Enriching OpenWordNet-PT

Authors: Hugo Gonçalo Oliveira, Fredson Silva de Souza Aguiar, and Alexandre Rademaker

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)

Abstract

The maintenance of wordnets and lexical knwoledge bases typically relies on time-consuming manual effort. In order to minimise this issue, we propose the exploitation of models of distributional semantics, namely word embeddings learned from corpora, in the automatic identification of relation instances missing in a wordnet. Analogy-solving methods are first used for learning a set of relations from analogy tests focused on each relation. Despite their low accuracy, we noted that a portion of the top-given answers are good suggestions of relation instances that could be included in the wordnet. This procedure is applied to the enrichment of OpenWordNet-PT, a public Portuguese wordnet. Relations are learned from data acquired from this resource, and illustrative examples are provided. Results are promising for accelerating the identification of missing relation instances, as we estimate that about 17% of the potential suggestions are good, a proportion that almost doubles if some are automatically invalidated.

Cite as

Hugo Gonçalo Oliveira, Fredson Silva de Souza Aguiar, and Alexandre Rademaker. On the Utility of Word Embeddings for Enriching OpenWordNet-PT. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 21:1-21:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{goncalooliveira_et_al:OASIcs.LDK.2021.21,
  author =	{Gon\c{c}alo Oliveira, Hugo and Aguiar, Fredson Silva de Souza and Rademaker, Alexandre},
  title =	{{On the Utility of Word Embeddings for Enriching OpenWordNet-PT}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{21:1--21:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.21},
  URN =		{urn:nbn:de:0030-drops-145578},
  doi =		{10.4230/OASIcs.LDK.2021.21},
  annote =	{Keywords: word embeddings, lexical resources, wordnet, analogy tests}
}

@InProceedings{goncalooliveira_et_al:OASIcs.LDK.2021.21,
  author =	{Gon\c{c}alo Oliveira, Hugo and Aguiar, Fredson Silva de Souza and Rademaker, Alexandre},
  title =	{{On the Utility of Word Embeddings for Enriching OpenWordNet-PT}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{21:1--21:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.21},
  URN =		{urn:nbn:de:0030-drops-145578},
  doi =		{10.4230/OASIcs.LDK.2021.21},
  annote =	{Keywords: word embeddings, lexical resources, wordnet, analogy tests}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.7

MUAHAH: Taking the Most out of Simple Conversational Agents

Authors: Leonor Llansol, João Santos, Luís Duarte, José Santos, Mariana Gaspar, Ana Alves, Hugo Gonçalo Oliveira, and Luísa Coheur

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

Dialog engines based on multi-agent architectures usually select a single agent, deemed to be the most suitable for a given scenario or for responding to a specific request, and disregard the answers from all of the other available agents. In this work, we present a multi-agent plug-and-play architecture that: (i) enables the integration of different agents; (ii) includes a decision maker module, responsible for selecting a suitable answer out of the responses of different agents. As usual, a single agent can be chosen to provide the final answer, but the latter can also be obtained from the responses of several agents, according to a voting scheme. We also describe three case studies in which we test several agents and decision making strategies; and show how new agents and a new decision strategy can be easily plugged in and take advantage of this platform in different ways. Experimentation also confirms that considering several agents contributes to better responses.

Cite as

Leonor Llansol, João Santos, Luís Duarte, José Santos, Mariana Gaspar, Ana Alves, Hugo Gonçalo Oliveira, and Luísa Coheur. MUAHAH: Taking the Most out of Simple Conversational Agents. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 7:1-7:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{llansol_et_al:OASIcs.SLATE.2021.7,
  author =	{Llansol, Leonor and Santos, Jo\~{a}o and Duarte, Lu{\'\i}s and Santos, Jos\'{e} and Gaspar, Mariana and Alves, Ana and Gon\c{c}alo Oliveira, Hugo and Coheur, Lu{\'\i}sa},
  title =	{{MUAHAH: Taking the Most out of Simple Conversational Agents}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{7:1--7:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.7},
  URN =		{urn:nbn:de:0030-drops-144248},
  doi =		{10.4230/OASIcs.SLATE.2021.7},
  annote =	{Keywords: Dialog systems, question answering, information retrieval, multi-agent}
}

@InProceedings{llansol_et_al:OASIcs.SLATE.2021.7,
  author =	{Llansol, Leonor and Santos, Jo\~{a}o and Duarte, Lu{\'\i}s and Santos, Jos\'{e} and Gaspar, Mariana and Alves, Ana and Gon\c{c}alo Oliveira, Hugo and Coheur, Lu{\'\i}sa},
  title =	{{MUAHAH: Taking the Most out of Simple Conversational Agents}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{7:1--7:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.7},
  URN =		{urn:nbn:de:0030-drops-144248},
  doi =		{10.4230/OASIcs.SLATE.2021.7},
  annote =	{Keywords: Dialog systems, question answering, information retrieval, multi-agent}
}

Document

DOI: 10.4230/OASIcs.SLATE.2020.9

Exploring Different Methods for Solving Analogies with Portuguese Word Embeddings

Authors: Tiago Sousa, Hugo Gonçalo Oliveira, and Ana Alves

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

A common way of assessing static word embeddings is to use them for solving analogies of the kind "what is to king as man is to woman?". For this purpose, the vector offset method (king - man + woman = queen), also known as 3CosAdd, has been effectively used for solving analogies and assessing different models of word embeddings in different languages. However, some researchers pointed out that this method is not the most effective for this purpose. Following this, we tested alternative analogy solving methods (3CosMul, 3CosAvg, LRCos) in Portuguese word embeddings and confirmed the previous statement. Specifically, those methods are used to answer the Portuguese version of the Google Analogy Test, dubbed LX-4WAnalogies, which covers syntactic and semantic analogies of different kinds. We discuss the accuracy of different methods applied to different models of embeddings and take some conclusions. Indeed, all methods outperform 3CosAdd, and the best performance is consistently achieved with LRCos, in GloVe.

Cite as

Tiago Sousa, Hugo Gonçalo Oliveira, and Ana Alves. Exploring Different Methods for Solving Analogies with Portuguese Word Embeddings. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{sousa_et_al:OASIcs.SLATE.2020.9,
  author =	{Sousa, Tiago and Gon\c{c}alo Oliveira, Hugo and Alves, Ana},
  title =	{{Exploring Different Methods for Solving Analogies with Portuguese Word Embeddings}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{9:1--9:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.9},
  URN =		{urn:nbn:de:0030-drops-130229},
  doi =		{10.4230/OASIcs.SLATE.2020.9},
  annote =	{Keywords: analogies, word embeddings, semantic relations, syntactic relations, Portuguese}
}

Document

Short Paper

DOI: 10.4230/OASIcs.SLATE.2020.16

Assessing Factoid Question-Answer Generation for Portuguese (Short Paper)

Authors: João Ferreira, Ricardo Rodrigues, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

We present work on the automatic generation of question-answer pairs in Portuguese, useful, for instance, for populating the knowledge-base of question-answering systems. This includes: (i) a new corpus of close to 600 factoid sentences, manually created from an existing corpus of questions and answers, used as our benchmark; (ii) two approaches for the automatic generation of question-answer pairs, which can be seen as baselines; (iii) results of those approaches in the corpus.

Cite as

João Ferreira, Ricardo Rodrigues, and Hugo Gonçalo Oliveira. Assessing Factoid Question-Answer Generation for Portuguese (Short Paper). In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 16:1-16:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{ferreira_et_al:OASIcs.SLATE.2020.16,
  author =	{Ferreira, Jo\~{a}o and Rodrigues, Ricardo and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Assessing Factoid Question-Answer Generation for Portuguese}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{16:1--16:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.16},
  URN =		{urn:nbn:de:0030-drops-130298},
  doi =		{10.4230/OASIcs.SLATE.2020.16},
  annote =	{Keywords: Question-Answer Generation, Corpus, NLP, Portuguese}
}

Document

Complete Volume

DOI: 10.4230/OASIcs.SLATE.2019

OASIcs, Volume 74, SLATE'19, Complete Volume

Authors: Ricardo Rodrigues, Jan Janoušek, Luís Ferreira, Luísa Coheur, Fernando Batista, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

OASIcs, Volume 74, SLATE'19, Complete Volume

Cite as

8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@Proceedings{rodrigues_et_al:OASIcs.SLATE.2019,
  title =	{{OASIcs, Volume 74, SLATE'19, Complete Volume}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019},
  URN =		{urn:nbn:de:0030-drops-109008},
  doi =		{10.4230/OASIcs.SLATE.2019},
  annote =	{Keywords: Computing methodologies, Natural language processing, Software and its engineering, Compilers; Information systems, World Wide Web}
}

Document

Front Matter

DOI: 10.4230/OASIcs.SLATE.2019.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Ricardo Rodrigues, Jan Janoušek, Luís Ferreira, Luísa Coheur, Fernando Batista, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 0:i-0:xviii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2019.0,
  author =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{0:i--0:xviii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.0},
  URN =		{urn:nbn:de:0030-drops-108679},
  doi =		{10.4230/OASIcs.SLATE.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2019.0,
  author =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{0:i--0:xviii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.0},
  URN =		{urn:nbn:de:0030-drops-108679},
  doi =		{10.4230/OASIcs.SLATE.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.2

Using Lucene for Developing a Question-Answering Agent in Portuguese

Authors: Hugo Gonçalo Oliveira, Ricardo Filipe, Ricardo Rodrigues, and Ana Alves

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Given the limitations of available platforms for creating conversational agents, and that a question-answering agent suffices in many scenarios, we take advantage of the Information Retrieval library Lucene for developing such an agent for Portuguese. The solution described answers natural language questions based on an indexed list of FAQs. Its adaptation to different domains is a matter of changing the underlying list. Different configurations of this solution, mostly on the language analysis level, resulted in different search strategies, which were tested for answering questions about the economic activity in Portugal. In addition to comparing the different search strategies, we concluded that, towards better answers, it is fruitful to combine the results of different strategies with a voting method.

Cite as

Hugo Gonçalo Oliveira, Ricardo Filipe, Ricardo Rodrigues, and Ana Alves. Using Lucene for Developing a Question-Answering Agent in Portuguese. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2019.2,
  author =	{Gon\c{c}alo Oliveira, Hugo and Filipe, Ricardo and Rodrigues, Ricardo and Alves, Ana},
  title =	{{Using Lucene for Developing a Question-Answering Agent in Portuguese}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{2:1--2:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.2},
  URN =		{urn:nbn:de:0030-drops-108692},
  doi =		{10.4230/OASIcs.SLATE.2019.2},
  annote =	{Keywords: information retrieval, question answering, natural language interface, natural language processing, natural language understanding}
}

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2019.2,
  author =	{Gon\c{c}alo Oliveira, Hugo and Filipe, Ricardo and Rodrigues, Ricardo and Alves, Ana},
  title =	{{Using Lucene for Developing a Question-Answering Agent in Portuguese}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{2:1--2:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.2},
  URN =		{urn:nbn:de:0030-drops-108692},
  doi =		{10.4230/OASIcs.SLATE.2019.2},
  annote =	{Keywords: information retrieval, question answering, natural language interface, natural language processing, natural language understanding}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.18

Improving NLTK for Processing Portuguese

Authors: João Ferreira, Hugo Gonçalo Oliveira, and Ricardo Rodrigues

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Python has a growing community of users, especially in the AI and ML fields. Yet, Computational Processing of Portuguese in this programming language is limited, in both available tools and results. This paper describes NLPyPort, a NLP pipeline in Python, primarily based on NLTK, and focused on Portuguese. It is mostly assembled from pre-existent resources or their adaptations, but improves over the performance of existing alternatives in Python, namely in the tasks of tokenization, PoS tagging, lemmatization and NER.

Cite as

João Ferreira, Hugo Gonçalo Oliveira, and Ricardo Rodrigues. Improving NLTK for Processing Portuguese. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 18:1-18:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{ferreira_et_al:OASIcs.SLATE.2019.18,
  author =	{Ferreira, Jo\~{a}o and Gon\c{c}alo Oliveira, Hugo and Rodrigues, Ricardo},
  title =	{{Improving NLTK for Processing Portuguese}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{18:1--18:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.18},
  URN =		{urn:nbn:de:0030-drops-108852},
  doi =		{10.4230/OASIcs.SLATE.2019.18},
  annote =	{Keywords: NLP, Tokenization, PoS tagging, Lemmatization, Named Entity Recognition}
}

Document

DOI: 10.4230/OASIcs.SLATE.2018.12

ASAPP 2.0: Advancing the state-of-the-art of semantic textual similarity for Portuguese

Authors: Ana Alves, Hugo Gonçalo Oliveira, Ricardo Rodrigues, and Rui Encarnação

Published in: OASIcs, Volume 62, 7th Symposium on Languages, Applications and Technologies (SLATE 2018)

Abstract

Semantic Textual Similarity (STS) aims at computing the proximity of meaning transmitted by two sentences. In 2016, the ASSIN shared task targeted STS in Portuguese and released training and test collections. This paper describes the development of ASAPP, a system that participated in ASSIN, but has been improved since then, and now achieves the best results in this task. ASAPP learns a STS function from a broad range of lexical, syntactic, semantic and distributional features. This paper describes the features used in the current version of ASAPP, and how they are exploited in a regression algorithm to achieve the best published results for ASSIN to date, in both European and Brazilian Portuguese.

Cite as

Ana Alves, Hugo Gonçalo Oliveira, Ricardo Rodrigues, and Rui Encarnação. ASAPP 2.0: Advancing the state-of-the-art of semantic textual similarity for Portuguese. In 7th Symposium on Languages, Applications and Technologies (SLATE 2018). Open Access Series in Informatics (OASIcs), Volume 62, pp. 12:1-12:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{alves_et_al:OASIcs.SLATE.2018.12,
  author =	{Alves, Ana and Gon\c{c}alo Oliveira, Hugo and Rodrigues, Ricardo and Encarna\c{c}\~{a}o, Rui},
  title =	{{ASAPP 2.0: Advancing the state-of-the-art of semantic textual similarity for Portuguese}},
  booktitle =	{7th Symposium on Languages, Applications and Technologies (SLATE 2018)},
  pages =	{12:1--12:17},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-072-9},
  ISSN =	{2190-6807},
  year =	{2018},
  volume =	{62},
  editor =	{Henriques, Pedro Rangel and Leal, Jos\'{e} Paulo and Leit\~{a}o, Ant\'{o}nio Menezes and Guinovart, Xavier G\'{o}mez},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2018.12},
  URN =		{urn:nbn:de:0030-drops-92709},
  doi =		{10.4230/OASIcs.SLATE.2018.12},
  annote =	{Keywords: natural language processing, semantic textual similarity, semantic relations, word embeddings, character n-grams, supervised machine learning}
}

@InProceedings{alves_et_al:OASIcs.SLATE.2018.12,
  author =	{Alves, Ana and Gon\c{c}alo Oliveira, Hugo and Rodrigues, Ricardo and Encarna\c{c}\~{a}o, Rui},
  title =	{{ASAPP 2.0: Advancing the state-of-the-art of semantic textual similarity for Portuguese}},
  booktitle =	{7th Symposium on Languages, Applications and Technologies (SLATE 2018)},
  pages =	{12:1--12:17},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-072-9},
  ISSN =	{2190-6807},
  year =	{2018},
  volume =	{62},
  editor =	{Henriques, Pedro Rangel and Leal, Jos\'{e} Paulo and Leit\~{a}o, Ant\'{o}nio Menezes and Guinovart, Xavier G\'{o}mez},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2018.12},
  URN =		{urn:nbn:de:0030-drops-92709},
  doi =		{10.4230/OASIcs.SLATE.2018.12},
  annote =	{Keywords: natural language processing, semantic textual similarity, semantic relations, word embeddings, character n-grams, supervised machine learning}
}

Document

Short Paper

DOI: 10.4230/OASIcs.SLATE.2018.18

NLPPort: A Pipeline for Portuguese NLP (Short Paper)

Authors: Ricardo Rodrigues, Hugo Gonçalo Oliveira, and Paulo Gomes

Published in: OASIcs, Volume 62, 7th Symposium on Languages, Applications and Technologies (SLATE 2018)

Abstract

Although there are tools for some the most common natural language processing tasks in Portuguese, there is a lack of available cross-platform tools specifically targeted for Portuguese, from end to end, namely for integration in projects developed in Java. To address this issue, we have developed and tweaked, over the last half-dozen years, NLPPort, a set of tools that can be used in a pipelined fashion, which we have made publicly available. In this paper, we present the major features of such set of tools.

Cite as

Ricardo Rodrigues, Hugo Gonçalo Oliveira, and Paulo Gomes. NLPPort: A Pipeline for Portuguese NLP (Short Paper). In 7th Symposium on Languages, Applications and Technologies (SLATE 2018). Open Access Series in Informatics (OASIcs), Volume 62, pp. 18:1-18:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2018.18,
  author =	{Rodrigues, Ricardo and Gon\c{c}alo Oliveira, Hugo and Gomes, Paulo},
  title =	{{NLPPort: A Pipeline for Portuguese NLP}},
  booktitle =	{7th Symposium on Languages, Applications and Technologies (SLATE 2018)},
  pages =	{18:1--18:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-072-9},
  ISSN =	{2190-6807},
  year =	{2018},
  volume =	{62},
  editor =	{Henriques, Pedro Rangel and Leal, Jos\'{e} Paulo and Leit\~{a}o, Ant\'{o}nio Menezes and Guinovart, Xavier G\'{o}mez},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2018.18},
  URN =		{urn:nbn:de:0030-drops-92768},
  doi =		{10.4230/OASIcs.SLATE.2018.18},
  annote =	{Keywords: natural language processing, tools, Portuguese}
}

Document

DOI: 10.4230/OASIcs.SLATE.2017.12

A REST Service for Poetry Generation

Authors: Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 56, 6th Symposium on Languages, Applications and Technologies (SLATE 2017)

Abstract

This paper describes a REST API developed on the top of PoeTryMe, a poetry generation platform. This API exposes several functionalities, from the production of full poems, to narrower tasks, having in mind their utility for poetry composition, including the acquisition of well-formed lines, or semantically-related words, possibly constrained by the number of syllables, rhyme, or polarity. Examples that illustrate the endpoints and what they can be used for are also revealed.

Cite as

Hugo Gonçalo Oliveira. A REST Service for Poetry Generation. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 12:1-12:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{goncalooliveira:OASIcs.SLATE.2017.12,
  author =	{Gon\c{c}alo Oliveira, Hugo},
  title =	{{A REST Service for Poetry Generation}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{12:1--12:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Leal, Jos\'{e} Paulo and Varanda, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2017.12},
  URN =		{urn:nbn:de:0030-drops-79468},
  doi =		{10.4230/OASIcs.SLATE.2017.12},
  annote =	{Keywords: REST, creative web services, poetry generation, computational creativity}
}

Document

DOI: 10.4230/OASIcs.SLATE.2017.16

Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases

Authors: Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 56, 6th Symposium on Languages, Applications and Technologies (SLATE 2017)

Abstract

There are currently several lexical-semantic knowledge bases (LKBs) for Portuguese, developed by different teams and following different approaches. In this paper, the open Portuguese LKBs are briefly analysed, with a focus on size and overlapping contents, and new LKBs are created from their redundant information. Existing and new LKBs are then exploited in the performance of semantic analysis tasks and their performance is compared. Results confirm that, instead of selecting a single LKB to use, it is worth combining all the open Portuguese LKBs.

Cite as

Hugo Gonçalo Oliveira. Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 16:1-16:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{goncalooliveira:OASIcs.SLATE.2017.16,
  author =	{Gon\c{c}alo Oliveira, Hugo},
  title =	{{Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{16:1--16:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Leal, Jos\'{e} Paulo and Varanda, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2017.16},
  URN =		{urn:nbn:de:0030-drops-79505},
  doi =		{10.4230/OASIcs.SLATE.2017.16},
  annote =	{Keywords: Lexical Knowledge Bases, Portuguese, WordNet, Redundancy, Semantic Similarity}
}

Document

Complete Volume

DOI: 10.4230/OASIcs.SLATE.2016

OASIcs, Volume 51, SLATE'16, Complete Volume

Authors: Marjan Mernik, José Paulo Leal, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 51, 5th Symposium on Languages, Applications and Technologies (SLATE'16) (2016)

Abstract

OASIcs, Volume 51, SLATE'16, Complete Volume

Cite as

5th Symposium on Languages, Applications and Technologies (SLATE'16). Open Access Series in Informatics (OASIcs), Volume 51, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@Proceedings{mernik_et_al:OASIcs.SLATE.2016,
  title =	{{OASIcs, Volume 51, SLATE'16, Complete Volume}},
  booktitle =	{5th Symposium on Languages, Applications and Technologies (SLATE'16)},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-006-4},
  ISSN =	{2190-6807},
  year =	{2016},
  volume =	{51},
  editor =	{Mernik, Marjan and Leal, Jos\'{e} Paulo and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2016},
  URN =		{urn:nbn:de:0030-drops-60617},
  doi =		{10.4230/OASIcs.SLATE.2016},
  annote =	{Keywords: Natural Language Processing, Programming Languages}
}

Document

Front Matter

DOI: 10.4230/OASIcs.SLATE.2016.0

Front Matter, Table of Contents, Preface, Program Committee, List of Authors

Authors: Marjan Mernik, José Paulo Leal, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 51, 5th Symposium on Languages, Applications and Technologies (SLATE'16) (2016)

Abstract

Front Matter, Table of Contents, Preface, Program Committee, List of Authors

Cite as

5th Symposium on Languages, Applications and Technologies (SLATE'16). Open Access Series in Informatics (OASIcs), Volume 51, pp. 0:i-0:xiv, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{mernik_et_al:OASIcs.SLATE.2016.0,
  author =	{Mernik, Marjan and Leal, Jos\'{e} Paulo and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Front Matter, Table of Contents, Preface, Program Committee, List of Authors}},
  booktitle =	{5th Symposium on Languages, Applications and Technologies (SLATE'16)},
  pages =	{0:i--0:xiv},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-006-4},
  ISSN =	{2190-6807},
  year =	{2016},
  volume =	{51},
  editor =	{Mernik, Marjan and Leal, Jos\'{e} Paulo and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2016.0},
  URN =		{urn:nbn:de:0030-drops-60052},
  doi =		{10.4230/OASIcs.SLATE.2016.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Program Committee, List of Authors}
}

Document

DOI: 10.4230/OASIcs.SLATE.2016.3

Comparing the Performance of Different NLP Toolkits in Formal and Social Media Text

Authors: Alexandre Pinto, Hugo Gonçalo Oliveira, and Ana Oliveira Alves

Published in: OASIcs, Volume 51, 5th Symposium on Languages, Applications and Technologies (SLATE'16) (2016)

Abstract

Nowadays, there are many toolkits available for performing common natural language processing tasks, which enable the development of more powerful applications without having to start from scratch. In fact, for English, there is no need to develop tools such as tokenizers, part-of-speech (POS) taggers, chunkers or named entity recognizers (NER). The current challenge is to select which one to use, out of the range of available tools. This choice may depend on several aspects, including the kind and source of text, where the level, formal or informal, may influence the performance of such tools. In this paper, we assess a range of natural language processing toolkits with their default configuration, while performing a set of standard tasks (e.g. tokenization, POS tagging, chunking and NER), in popular datasets that cover newspaper and social network text. The obtained results are analyzed and, while we could not decide on a single toolkit, this exercise was very helpful to narrow our choice.

Cite as

Alexandre Pinto, Hugo Gonçalo Oliveira, and Ana Oliveira Alves. Comparing the Performance of Different NLP Toolkits in Formal and Social Media Text. In 5th Symposium on Languages, Applications and Technologies (SLATE'16). Open Access Series in Informatics (OASIcs), Volume 51, pp. 3:1-3:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{pinto_et_al:OASIcs.SLATE.2016.3,
  author =	{Pinto, Alexandre and Gon\c{c}alo Oliveira, Hugo and Oliveira Alves, Ana},
  title =	{{Comparing the Performance of Different NLP Toolkits in Formal and Social Media Text}},
  booktitle =	{5th Symposium on Languages, Applications and Technologies (SLATE'16)},
  pages =	{3:1--3:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-006-4},
  ISSN =	{2190-6807},
  year =	{2016},
  volume =	{51},
  editor =	{Mernik, Marjan and Leal, Jos\'{e} Paulo and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2016.3},
  URN =		{urn:nbn:de:0030-drops-60086},
  doi =		{10.4230/OASIcs.SLATE.2016.3},
  annote =	{Keywords: Natural language processing, toolkits, formal text, social media, benchmark}
}

Document

DOI: 10.4230/OASIcs.SLATE.2014.169

Assigning Polarity Automatically to the Synsets of a Wordnet-like Resource

Authors: Hugo Gonçalo Oliveira, António Paulo Santos, and Paulo Gomes

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)

Abstract

This article describes work towards the automatic creation of a conceptual polarity lexicon for Portuguese. For this purpose, we take advantage of a polarity lexicon based on single lemmas to assign polarities to the synsets of a wordnet-like resource. We assume that each synset has the polarity of the majority of its lemmas, given by the initial lexicon. After that, polarity is propagated to other synsets, through different types of semantic relations. The relation types used were selected after manual evaluation. The main result of this work is a lexicon with more than 10,000 synsets with an assigned polarity, with accuracy of 70% or 79%, depending on the human evaluator. For Portuguese, this is the first synset-based polarity lexicon we are aware of. In addition to this contribution, the presented approach can be applied to create similar resources for other languages.

Cite as

Hugo Gonçalo Oliveira, António Paulo Santos, and Paulo Gomes. Assigning Polarity Automatically to the Synsets of a Wordnet-like Resource. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 169-184, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2014.169,
  author =	{Gon\c{c}alo Oliveira, Hugo and Santos, Ant\'{o}nio Paulo and Gomes, Paulo},
  title =	{{Assigning Polarity Automatically to the Synsets of a Wordnet-like Resource}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{169--184},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.169},
  URN =		{urn:nbn:de:0030-drops-45689},
  doi =		{10.4230/OASIcs.SLATE.2014.169},
  annote =	{Keywords: sentiment analysis, polarity, lexicon, wordnet, Portuguese}
}

Document

DOI: 10.4230/OASIcs.SLATE.2014.267

LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese

Authors: Ricardo Rodrigues, Hugo Gonçalo Oliveira, and Paulo Gomes

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)

Abstract

Although lemmatization is a very common subtask in many natural language processing tasks, there is a lack of available true cross-platform lemmatization tools specifically targeted for Portuguese, namely for integration in projects developed in Java. To address this issue, we have developed a lemmatizer, initially just for our own use, but which we have decided to make publicly available. The lemmatizer, presented in this document, yields an overall accuracy over 98% when compared against a manually revised corpus.

Cite as

Ricardo Rodrigues, Hugo Gonçalo Oliveira, and Paulo Gomes. LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 267-274, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2014.267,
  author =	{Rodrigues, Ricardo and Gon\c{c}alo Oliveira, Hugo and Gomes, Paulo},
  title =	{{LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{267--274},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.267},
  URN =		{urn:nbn:de:0030-drops-45753},
  doi =		{10.4230/OASIcs.SLATE.2014.267},
  annote =	{Keywords: lemmatization, normalization, rules, lexicon}
}

Search Results

Documents authored by Gonçalo Oliveira, Hugo

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message